Clustering with optimised weights for Gower ’ s metric

نویسنده

Mark Hoogendoorn

چکیده

NEDERLANDS) In dit onderzoek formuleren we een algoritme om de gewichten van Gower’s afstandsmaat te optimaliseren zodat de cophenetic correlation coefficient (CPCC) gemaximaliseerd wordt. We gebruiken het L-BFGS algorithme met eenvoudige restricties om de gewichten te optimaliseren. We valideren ons algorithm met behulp van kunstmatige en echte datasets. We concluderen dat de gewichten een vitaal onderdeel zijn bij het generen van een betere clustering. Ons algoritme verhoogt de CPCC van 0.84 naar 0.97 bij een dataset die gebaseerd is op het gebruik van een mobiele applicatie. Verder bevestigen we op statistische wijze dat ons geoptimaliseerde hierarchische clustering algoritme betere resultaten genereert dan onze benchmark algoritmes bij meerdere interne kwaliteitsmaten. Verder bewijzen we dat onze methode ook gebruikt kan worden voor een geoptimaliseerde K-medoids implementatie met vergelijkbare resultaten. Als laatste tonen we aan dat het mogelijk is om een analytische afgeleide van de CPCC af te leiden als we enkele gewichten constant houden ten opzichte van elkaar.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

An Empirical Comparison of Dissimilarity Measures for Recommender Systems

Many content-based recommendation approaches are based on a dissimilarity measure based on the product attributes. In this paper, we evaluate four dissimilarity measures for product recommendation using an online survey. In this survey, we asked users to specify which products they considered to be relevant recommendations given a reference product. We used microwave ovens as product category. ...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل